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It has been estimated that about 30% of the genes in the hu- 
man genome are regulated by microRN As (miRNAs) . These are short 
RNA sequences that can down-regulate the levels of mRNAs or pro- 
teins in animals and plants. Genes regulated by miRNAs are called 
targets. Typically, methods for target prediction are based solely on 
sequence data and on the structure information. In this paper we pro- 
pose a Bayesian graphical modeling approach that infers the miRNA 
regulatory network by integrating expression levels of miRNAs with 
their potential mRNA targets and, via the prior probability model, 
with their sequence/structure information. We use a directed graph- 
ical model with a particular structure adapted to our data based 
on biological considerations. We then achieve network inference us- 
ing stochastic search methods for variable selection that allow us to 
explore the huge model space via MCMC. A time-dependent coef- 
ficients model is also implemented. We consider experimental data 
from a study on a very well-known developmental toxicant causing 
neural tube defects, hyperthermia. Some of the pairs of target gene 
and miRNA we identify seem very plausible and warrant future inves- 
tigation. Our proposed method is general and can be easily applied to 
other types of network inference by integrating multiple data sources. 

1. Introduction. One of the major tasks and challenges in the post- 
genomics era is to decipher how genes and their products (proteins) are 
regulated. Regulation can happen at transcriptional, post-transcriptional, 
translational and post-translational level. Transcription is the process of 
synthesizing a stretch of ribonucleic acids (RNA) based on a specific DNA 
sequence. Transcriptional regulation can affect whether or not a specific 
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RNA is transcribed as well as the amount of RNA produced. RNA can be 
regulated post-transcriptionally through degradation or modification of the 
RNA strand, which can affect its function. A segment of RNA can interact 
with other genes or proteins or can encode a protein. Translation, the pro- 
cess of forming a protein based on an RNA sequence, can also be positively 
or negatively regulated. Proteins often undergo post-translational modifi- 
cations, which can affect their function. An abundant class of short (~22 
nucleotide) RNAs, known as microRNAs (miRNAs), plays crucial regula- 
tory roles in animals and plants [Farh et al. (2005)]. It has been estimated 
that at least 30% of the genes in human genomes are regulated by miRNAs; 
see Lewis, Burge and Bartel (2005) and also Rajewsky (2006). Genes regu- 
lated by miRNAs are generally called "targets." The actual mechanism of 
miRNA regulation is still an active area of research and the complete picture 
of the regulatory mechanism is still to be understood [Thermann and Hentze 
(2007)]. According to current knowledge, it is believed that miRNAs regu- 
late their targets either by degrading mRNA post-transcriptionally [Bagga 
et al. (2005)], or by suppressing initiation of protein synthesis [Pillai et al. 

(2005) ], and/or by inhibiting translation elongation after initiation of pro- 
tein synthesis [Petersen et al. (2006)]. The biosynthesis and maturation of 
miRNAs is composed of distinctive events. Briefly, the miRNA precursor is 
processed by Dicer to produce the mature, single-stranded molecule that is 
incorporated into the RNA-induced silencing complex (RISC) and possesses 
a 6-8 bases of "seed" sequence that mainly targets complementary sequences 
within the 3'-untranslated regions (UTR) of mRNA transcripts. 

Many algorithms have been developed to predict potential target se- 
quences for miRNAs based on their specific sequence and structure char- 
acteristics. These algorithms mainly use sequence information, hybridiza- 
tion energy for structure prediction and cross-species comparisons [Rajewsky 

(2006) ]. Target prediction algorithms generally take into account different 
factors that influence miRNA/target interactions, such as seed match com- 
plementarity, 3'-UTR seed match context, the conservation, favorability of 
free energy binding and binding site accessibility. Some of the more widely 
used algorithms include the following: TargetScan of Lewis, Burge and Bar- 
tel (2005), PicTar of Krek et al. (2005), miRanda of John et al. (2004), PITA 
of Kertesz et al. (2007) and DIANA-microT of Kiriakidou et al. (2004). A 
comprehensive review of these and other methods can be found in Yoon 
and Micheli (2006). Typically, a large amount (e.g., hundreds to thousands) 
of potential targets are predicted by these algorithms, and it can be over- 
whelming for researchers to search through the candidate targets for those 
which play critical regulatory roles under particular experimental or clinical 
conditions. 

Our goal is to develop a statistical approach to identify a small set of po- 
tential targets with high confidence, making future experimental validation 
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feasible. Since miRNAs down-regulate the expression of their targets, ex- 
pression profile of miRNAs and their potential targets can be used to infer 
their regulatory relationships. We propose a Bayesian graphical modeling 
approach that infers the miRNA regulatory network by integrating these 
two types of expression levels. We use a directed graphical model with a 
particular structure adapted to our data based on biological considerations. 
We take into account current knowledge on the down-regulation effect of 
mRNA expression (by miRNA) by imposing constraints on the sign of the 
regression coefficients of our proposed model. The model also integrates the 
sequence/structure information, as generated by widely used target predic- 
tion algorithms, via the prior probability model. We then achieve network 
inference using stochastic search methods for variable selection. 

We consider experimental data from a study on a very well-known de- 
velopmental toxicant causing neural tube defects, hyperthermia. We have 
available 23 mouse miRNAs and a total of 1297 potential targets. We in- 
fer their regulatory network under two different treatment conditions and 
also investigate time-dependent regulatory associations. Some of the pairs of 
target gene and miRNA we identify seem very plausible and warrant future 
investigation. 

Huang, Morris and Frey (2007), Huang, Frey and Morris (2008) have pro- 
posed a Bayesian model for the regulatory process of targets and miRNAs 
which is similar to the one we propose here. However, in their model for- 
mulation the authors consider regression coefficients that are constant with 
respect to the mRNAs, while our formulation allows a more efficient way 
of selecting gene-miRNA pairs. Also, in order to achieve posterior inference, 
we implement a full MCMC procedure while Huang, Morris and Frey (2007) 
adopt a variational method that only approximates the posterior distribu- 
tion. More importantly, Huang, Morris and Frey (2007) restrict their search 
algorithm to a preselected subset of possible gene-miRNA relations, which 
they select based on the available sequence information, therefore excluding 
a priori a large number of associations that could instead occur in specific 
experimental conditions, such as hyperthermia. 

This paper is organized as follows. Section 2 introduces the experimental 
study and describes the available data, that is, the expression data of miR- 
NAs and their potential mRNA targets, and the corresponding association 
scores. Section 3 illustrates the proposed modeling approach via a Bayesian 
graphical model and describes the prior model and the variable selection 
scheme. Section 4 describes how to perform posterior inference and Section 
5 provides a detailed analysis of the miRNA regulatory network reconstruc- 
tion based on the available data. Section 6 concludes the paper. 

2. Neural tube defects. Neural tube defects (NTDs) are some of the 
most common congenital defects, with approximately 12 per day in the 
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United States [Finnell et al. (2000)]. NTDs are generally related to failure of 
embryonic neural folds to fuse properly along the neuroaxis during develop- 
ment. Studies in both humans and animals suggest a complex genetic com- 
ponent to NTDs, likely involving multiple loci, together with environmental 
factors. MicroRNAs are believed to play important regulatory roles in mouse 
development and human disease [see, for example, Conrad, Barrier and Ford 
(2006)], although detailed regulatory mechanisms are still unknown. 

In this paper we consider experimental data from a study on a very well- 
known developmental toxicant causing neural tube defects, hyperthermia. In 
the study mice are used as the animal model to study NTDs. Time- mated 
female C57B1/6 mice were exposed in vivo to a 10 minute hyperthermia or 
control treatment on gestational day 8.5, when the neural folds are fusing 
to form the neural tube. Four litters were collected for each treatment at 
5, 10 and 24 hours after exposure. Each litter was treated as a single bio- 
logical sample. MiRNAs and mRNAs were extracted from each sample for 
expression analysis. 

2.1. miRNA expression levels. As the regulatory network can be very 
complex, we focus on a small sets of mRNA targets with high confidence. 
With a limited budget available, a pilot study was performed to screen the 
expression profiles of most of the known (~240 ) mouse microRNAs based on 
one set of samples, for both heat shock and control at 4 different time points, 
and using TaqMan miRNA RTPCR assays available at the time (Applied 
Biosystems, Foster City, CA; provided in collaboration with Ambion, Austin, 
TX). Of the 240 miRNA evaluated, 50 had none or very low expression at 
all time points, while 86 had a 2-fold or greater change in expression in 
response to hyperthermia exposure at one or more time point. From this set 
of 86 miRNA, we chose a subset of 23 miRNA whose patterns of expression 
were interesting enough for further analysis and obtained replicate sample 
sets. The complete experiment was therefore carried out using only this set 
of 23 miRNAs. Results from the analysis of these data, and their biological 
interpretation, are clearly limited by this initial choice. 

MicroRNA was extracted from each sample at each time point under 
each experimental condition. Two technical replicates were prepared for 
RTPCR quantification to confirm the technical reproducibility. In RTPCR 
experiments, fluorescence techniques are used to detect the ampliflcation of 
miRNAs to assess their abundance. A fluorescence threshold is determined 
for an experiment, and the cycle number, which reaches the predetermined 
threshold level of log2-based fluorescence, is defined as the Ct number. An 
inverse linear relationship exists between Ct number and the logarithm of 
input quantity of the gene when the amplification efficiency is perfect [Pfaffl 
(2001)]. The Ct numbers of the miRNA technical replicates were averaged 
across the two technical replicates. 
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2.2. Target prediction via sequence and structure information. Four of 
the most widely used algorithms, miRanda, TargetScan, PITA and PicTar, 
were used in our study to retrieve the sequence and structure information 
for target prediction. The algorithm miRanda [see John et al. (2004) and 
Enright et al. (2003)] computes optimal sequence complementarity between 
mature microRNAs and a mRNA using a weighted dynamic programming 
algorithm with weights that are position-dependent and that reflect the 
relative importance of the 5' and 3' regions. Its alignment score is a weighted 
sum of match and mismatch scores for base pairs and gap penalties. The free- 
energy of the formation of the microRNA:mRNA duplex is used by miRanda 
as a filtering step. PITA of Kertesz et al. (2007) focuses on the overall effect of 
all potential binding sites combined together on the given UTR. Pictar [see 
Krek et al. (2005)] utilizes genome-wide alignment among species to take 
conservation into consideration. Finally, TargetScan of Lewis, Burge and 
Bartel (2005) utilizes two orthogonal scores, one is the total context score, 
and the other independent score is the probability of conserved targeting. 
More details on each algorithm can be found in the original papers. 

The prediction scores using the four algorithms, miRanda, TargetScan, 
PITA and PicTar, can be obtained from the respective websites. Predictions 
by PicTar were obtained from the PicTar site^. A zero or absent PicTar 
score indicates that the raw score did not exceed a prespecified threshold, 
that is, the algorithm suggests no indication of a regulatory association. The 
current release (September 2009) comprising 1,209,841 predicted microRNA 
target sites in 26,697 mouse gene isoforms for 491 mouse miRNAs, gener- 
ated by the miRanda algorithm of John et al. (2004), was downloaded from 
MICR0RNA.ORG; see Betel et al. (2008). PITA Catalog version 6 (31-Aug-08) 
was downloaded from Segal lab's website'^. The target scores from all predic- 
tions were used in our study. TargetscanMouse 5.1 (released April 2009) was 
downloaded from the internet^. Scores for preferential conservation of the 
sites (Aggregate PCT) and the context of the sites within the UTR (total 
context score) were parsed and used in our study. Matlab scripts were writ- 
ten to retrieve the prediction scores from each algorithm using the RefSeq 
Ids of all potential targets downloaded for the 23 mouse miRNAs of interest. 

2.3. Target mRNA expression levels. RNA was extracted from each sam- 
ple at each time point and hybridized to GE Codelink Mouse Whole Genome 
Microarrays (GE Healthcare Life Sciences, Piscataway, NJ). The slides were 
scanned and mRNA expression levels were quantified. One biological sample 



^http : //pictar . mdc-berlin . de/ . 

^http : //genie . weizmann. ac . il/pubs/mir07/mir 07_data.html . 
http : //www. target scan. org/ cgi-bin/targetscan/data_download. cgi?db=mmu_50. 
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was not prepared properly at hour 10 in the control group, and therefore 
discarded. 

The RefSeq Ids of the probes spotted on the Codelink microarrays were 
linked to the retrieved potential targets of the 23 miRNAs previously identi- 
fied. The mRNAs were included in the analysis only if they were among the 
potential targets predicted by the four algorithms described above. Genes 
with missing or negative values were excluded from the analysis. The expres- 
sion levels of the remaining mRNAs were then log2 transformed so that both 
miRNA and mRNA expression were on the log2 scale. A total of 1297 poten- 
tial targets were included in the final analysis. The transformed expressions 
across the 3 time points were centered by subtracting their means. 

3. Model. We have available expression levels on a set of miRNAs and 
their potential targets. For each target we are interested in identifying a 
small number of regulatory associations with high confidence. We have also 
available sequence information for target prediction in the form of scores of 
regulatory associations. We propose a Bayesian graphical modeling approach 
that infers the miRNA regulatory network by integrating the expression data 
and, via the prior probability model, the sequence/structure information. An 
important aspect of our methodology is the concept of sparsity, that is, we 
believe that most genes are regulated by a small number of miRNAs. 

3.1. A Bayesian network for gene & miRNA expression. We use a di- 
rected graphical model (Bayesian Network) with a particular structure adapted 
to our data that uses a predetermined ordering of the nodes based on bio- 
logical considerations. This model is able to answer to the baseline question 
of "w/iic/i miRNAs regulate which targets" and, in addition, allows us to 
build a fast computational procedure required in such a high-dimensional 
framework. A graphical representation of the full miRNA network is given 
in Figure 1. Our task is to find a significant subset of edges. 

Graphical models are graphs in which nodes represent random variables 
and the lack of arcs represents conditional independence assumptions; see, 
for example, Cowell et al. (1999). Graphical models provide a compact rep- 
resentation of joint probability distributions. Here we work with a multi- 
variate normal distribution, and therefore with a Graphical Gaussian model 
(GGM). A graph Q and the covariance matrix entirely define a GGM M, 
Ai = {Q,Q). Arcs can be undirected, indicating symmetric dependencies, or 
directed, when there is a direction of the dependence. These dependencies 
can come from prior knowledge or from data analysis. Undirected graphical 
models have a simple definition of independence, for example, two nodes A 
and B are conditionally independent given a third set, C, if all paths be- 
tween the nodes in A and B are separated by a node in C. Directed graphical 
models need a specific ordering of the variables. Graphs that do not allow 
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Fig. 1. Graphical representation of the miRNA regulatory network. 



the presence of cycles are called directed acyclic graphs (DAG) . Conditional 
independencies in a DAG depend on the ordering of the variables. 

We work with a DAG and impose an ordering of the variables such that 
each target can be affected only by the miRNAs and that the miRNAs 
can affect only the targets. Let Z = (Yi, Y2, . . . , Yg,Xi, . . . ,Xa/) with Y = 
(Yi, . . . , Yg) the matrix representing the targets and X = (Xi, . . . ,Xjvf) the 
miRNAs . Specifically, i/ng indicates the normalized averaged log2 gene ex- 
pression of gene g = 1, . . . ,G in sample n = 1, . . . , N . These expression values 
are biological replicates obtained by averaging two technical replicates. Sim- 
ilarly, Xnm indicates the expression of the mth miRNA in sample n, with 
m = 1, . . . , M. We have G = 1297 and M = 23. In addition, we have = 11 
i.i.d. observations under the control status and = 12 i.i.d. observations 
under hyperthermia. We infer the miRNA regulatory network separately 
under the two conditions. 

Our assumptions are that Z is a matrix- variate normal variable with zero 
mean and a variance matrix 17 for its generic row, that is, following Dawid 
(1981) notation. 

In addition, we assume that the target genes are independent condition- 
ally upon the miRNAs, that is, Yj _IL Yj|Xi, . . . ,Xm and, without loss of 
generality, that the miRNAs are independent, that is, Xj _IL Xj. Note that 
the marginal distribution of (Xi,...,Xm) does not affect the regulatory 
network. In a Bayesian Network framework these assumptions imply an or- 
dering of the nodes and, consequently, a likelihood factorization of the type: 

G M 

(1) /(z)=n/(Y,ix)n/(Xm), 

g=l m=l 
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Fig. 2. Structure of the graphical model. 
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agonal element of and f^xx, ^^XY are the blocks of the covariance matrix 
according to the following partition: 



JIyy 

ilxY 



J^xxy 



For m = 1, . . . ,M we have c^fi — ^mm • 

According to current knowledge, miRNAs down-regulate gene expression. 
It therefore seems appropriate to include this information into our statistical 
model. This is achieved by specifying negative regression coefficients via the 
prior model. First, we note that our model is equivalent to the following 
system of equations: 



(2) 



Yi = -X/3i + e„ 



YG = -X/3G + e, 



(JO 1 



where e^g is distributed as a multivariate normal with zero mean and covari- 
ance matrix aglj^. Then, we complete the model specification by specifying 
prior distributions on the regressions coefficients and the error variances. We 
impose our biological constraints by using Gamma distribution priors for the 
positive regressions coefficients, {f3gm\cg) ~ Ga{l,cag), and Inverse-Gamma 
distributions for the error variances, o""^ ~ Ga{{5 + M)/2,d/2). Figure 2 
shows a graphical representation of our model. Circles indicate parameters 
and squares observed random variables. The parameters R and r are in- 
volved in the variable selection and are introduced in the section below. 
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3.2. Prior model for variable selection. The goal of the analysis is to 
find, for each target, a small subset of miRNAs that regulate that target 
with high probability. This can be framed into a variable selection problem. 
Specifically, we can introduce a (G x matrix R with elements r^m, — 

1 if 

the mth miRNA is included in the regression of the ^th target and rgm = 
otherwise. Conditioned upon R, expression (2) is equivalent to a system 
of linear equations where the included regressors are only those miRNAs 
corresponding to rgm = 1. To emphasize the variable selection nature of our 
model, we write it as follows: 

(3) = -X(^)/3g(^) +e^g, 

where is the vector that is formed by taking only the nonzero elements 
of f3g and '^(r) is the matrix that is formed by taking only the corresponding 
columns of X. The goal of our modeling is to infer which elements of the 
vectors /3g's are non-zero, indicating a relationship between the correspond- 
ing genes and miRNAs. This underlying regulatory network is encoded by 
the association matrix R = {rgm}- The elements of the vectors /3g's are then 
stochastically independent, given the regulatory network R, and have the 
following mixture prior distribution: 

(4) vr(/3g„|(Tg,rg„) = rgmGa{l,cag) + {1 - r gm) I [13^^=0] ■ 

In addition, taking into account the regulatory network, we obtain that 
(T~^|R ~ Ga{{6 + kg) /2, d/2), where kg is the number of significative miRNAs 
in the regression of the gth target. 

Mixture priors have been used extensively for variable selection in linear 
regression settings; see George and McCulloch (1993) for univariate regres- 
sion and Brown, Vannucci and Fearn (1998) and Sha et al. (2004) for multi- 
variate models. According to prior (4), when rgm = 0, then /3gm is estimated 
by and the corresponding column of X is excluded from the gth equation 
in model (2). Notice that the dimensions of the matrix X are such that 
there are many more columns than rows. In the domain of classical regres- 
sion, this results in insufficient degrees of freedom to fit the model unless 
constraints are placed on the regression coefficients /3g's. Conversely, this 
problem is readily addressed in the Bayesian paradigm and is known as the 
"small n, large p" framework. The variable selection formulation we adopt 
here overcomes the somehow rigid structure of the model in Brown, Van- 
nucci and Fearn (1998), which does not allow to select different predictors 
for different responses. See also Monni and Tadesse (2009) for an approach 
based on partition models. 
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3.3. Using association scores in the prior model. Scores of possible asso- 
ciations between gene-miRNA pairs obtained from sequence/structure infor- 
mation were used to estimate prior probabilities of miRNAs binding to their 
target genes. Let Sgm denote a generic score for gene g and miRNA m, ob- 
tained, for example, by the PicTar algorithm. As previously described, Sgm 
is either positive or, in the case of a regulatory association that is believed 
to be absent, equal to zero. Also, the PicTar algorithm shrinks small values 
to zero, setting Sgm. — if Sgm < C where ^ is a prespecified threshold used 
by the algorithm. In our model the Bernoulli random variable rgm indicates 
whether there is a relationship between gene g and miRNA m. We choose 
to model the success probability of rgm as a function of the Sgm score as 
follows: 

(5) Pir -llr)- ''Mv + rsgm] 

^^"^--^'^^-l + exp[ry + r.,^]' 

where r is an unknown parameter. We then assume that the elements of R 
are stochastically independent given r. Notice that for Sgm = 0, we have that 
Pifgm = 1) =exp[ry]/(l +exp[?7]), which gives a 0.5 prior probability when 
r/ = 0. Thus, the inverse logit transformation of r] can be interpreted as the 
false negative rate associated with the PicTar thresholding scheme. For a 
score Sgm > we have P{rgm = !)>??) with higher scores yielding higher 
prior probabilities of association. We further specify a hyperprior on r as a 
gamma distribution r ^ Ga{ar,br), ensuring the positivity of the parameter. 

Since we have available multiple prior sources of information, from differ- 
ent sequence/structure algorithms, it makes sense to combine them all by 
incorporating all scores into the prior distribution via additional r parame- 
ters. For example, in the application of Section 5 we combine five different 
scores as 



P{rgm = l\r) 



exp[7? + Tislm + r2slm + '^3Sgm + TjS'^gm + T-S^gm] 
1 + exp[rj + nslm + T2slm + 'T^s'lm + T^^jm + "^S^m] 



with r = (ri, . . . , rs) and where the s^m's, with j = 1, . . . , 5, denote the Pic- 
Tar, miRanda, aggregate Target Scan, total Target Scan and PITA scores, 
respectively. Scores should be normalized to obtain positive values that lie 
in the same range, with bigger values corresponding to stronger prior con- 
nections. 



3.4. Time- dependent coefficients model. The previous model implies that 
the relation between gene g and miRNA m is constant over time. In the ex- 
perimental study for which we developed our model there is no dependence 
between the measurements at different time points, since these observations 
come from independent units. However, one may still wish to incorporate 
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into the model the fact that relations may possibly change with time. This 
can be done by allowing different regression coefficients at different time 
points, as follows: 



Yi = -X/3i - X^/3[ - X^/3[' + e, 



(6) 




where the Y^'s are x 1 vectors and 

X=|^X2j, X^=|^X2 

are the N x M matrices of the observed values, with Xi, X2 and X3 the 
miRNA expressions collected at the first, the second and the third time 
point, respectively. The element f3gm G /3g represents the relation between 
gene g and miRNA m at the first time point, f3gm + P'gm^ with f3'g^ G f3'g, 
represents the relation at the second time point and f3gm + f^gm^ with /3^'^ G 
Pg, at the third time point. 

In order to do variable selection on the elements of fig and (3g, we intro- 
duce two additional binary matrices R' and R", with a similar role to R in 
the time- invariant model (3). We consider the elements of R' and R" inde- 
pendently distributed and following a Bernoulli distribution with parameter 
-f('"c/m = 1) = ^6 = Pi^gm — Because of the way we implement the MCMC 
(see Section 4), we do not need to impose the sequence information on the 
prior on R' and R". 

As for the elements of the /3g's vectors, we assume that the elements of 
the Pg's and f3g's vectors are stochastically independent given the regulatory 
networks R' and R", respectively, and that they have the following prior 
distributions: 

T^WgmWa^r'gm) = '■gm^(0, C" VgC) + (1 " ^gm ) =0] > 

-^WgmWg^r'gm) = '■gm^^(0, C" VgC) + (1 " r'^^)I[i3'^^=o], 

where the hyperparameter ^, usually < 1, refiects the prior information on 
the magnitude of the f3g's and /3g"s. 

We can reframe the time-dependent coefficients model in the same way 
we have framed model (3), that is, 

= ~^{R)f^9{R) ~ ^2{R')l^g{R') ~ ^3(R")l^g{R") + ^i^s ' 

where the columns of X2 are selected if the corresponding elements of R' are 
equal to 1 and the columns of X3 are selected if the corresponding elements 
of R" are equal to 1, for each equation. 
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4. Posterior inference. For posterior inference the primary interest is in 
estimating the association matrix R. Here we show that R can be estimated 
by designing a simple extension of the stochastic search procedures used for 
variable selection; see George and McCulloch (1993) and Sha et al. (2004), 
among many others. 

We use a Metropolis-Hastings within Gibbs to explore the huge model 
space and find the most influential predictors. Our model has 23 regressors 
for each of 1297 equations, that is a total of 29,831 regression coefficients for 
the time invariant model (3) and 89,493 for the time dependent model (6). 
Clearly, exploring such a huge posterior space is challenging. Here we exploit 
the sparsity of our model, that is, the belief that most of the genes are well 
predicted by a small number of regressors, and resort to a Stochastic Search 
Variable Selection (SSVS) method. A stochastic search allows us to explore 
the posterior space in an eff'ective way, quickly finding the most probable 
configurations, that is, those corresponding to the coefficients that have high 
marginal probability of r^m. = 1 , while spending less time in regions with low 
posterior probability. 

In order to design this MCMC search, we need to calculate the marginal 
posterior distribution of R by integrating out f3g from the posterior: 



X exp 



^k^{0;-UgCg,agUg), 



where Ug = (XT(^)X(^))-i, Cg = YJX^(^) - (a^Vc)!*:, and Qg = YJY, - 
CgUgCj and with kg the number of selected regressors. Here ^fcg(0; —UgCg, 
(TgUg) indicates the cdf of a multivariate normal, with mean —UgCg and 
covariance matrix (Jgllg, calculated at the zero vector. 

Our algorithm consists of three steps. The first step is based on the 
marginal posterior distribution conditioned upon ti, . . . and cjg and con- 
sists of either the addition or the deletion of one arrow in our graphical 
model or the swapping of two arrows. The second step generates new values 
of Tj's from their posterior distribution. In the last step values of all the 
error variances ag are updated. The un-normalized full conditionals needed 
for the Gibbs sampler can be derived from the conditional independencies 
of our model, as given in Figure 2. We now describe the three steps of the 
algorithm: 

1. We use one of two types of moves to update R: 

• with probability (j), we add or delete an element by choosing at random 
one component in the current R and changing its value; 
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with probability l — (j),we swap two elements by choosing independently 
at random one and one 1 in the current R and changing the value of 
both of them. 



The proposed R'^"^™ is then accepted with a probability that is the 
ratio of the relative posterior probabilities of the new versus the current 
model: 



(7) 



mm 



/(Y|X(^„ew),R-«-,a3)7r(R'^^"|r) 



/(Y|X(^oM),R°id,a3)7r(R°id|T) ' 

Because these moves are symmetric, the proposal distribution does not 
appear in the previous ratio. 
2. In order to update the Tj's, we employ Metropolis steps. The proposal is 
made via a truncated normal random walk kernel. The proposed is 
then accepted with probability 



(8) 



mm 



^(R|rf")^(Tf™)g(r: 



j 



_old. new 

'j ' 'j 



vr(R|r, 



old)^(^old 



where g(r/<^; r°<=^) is a truncated normal with mean Tj^™ and truncation 
at 0, given the constraint of positivity on Tj. The variance of this distri- 
bution represents the tuning parameter and has to be set in such a way 
to explore the parameter space and have a good acceptance rate; see also 
Section 5. 

3. For g = 1, . . . ,G, we update the error variance ag using a Metropolis step 
where the proposal distribution q{a°^'^;ag^'") is a Gamma distribution 
with parameters and b^- The proposed new value is then accepted 
with probability 



(9) 



mm 



/(Y|X(R),R,a--)7r(a--^ 



old . -.new ^ 
9 '"ff ^ 



/(Y|X(R) , R, a°ld)vr(a°ld)g(anew. ^old) 



,1 



To obtain an efficient exploration of the parameter space we set = 
ag^'^/ba and ba = e/a^'^, where e represents the variance of the proposal 
distribution and can be set to obtain a suitable acceptance ratio. 

Posterior inference can then be performed based on the MCMC output 
using the marginal probabilities of the singles rg^s. 

The MCMC algorithm for the time-dependent coefficient model (6) is 
pretty similar to the procedure described above, the main difference being 
that at the first step we update either R, R' or R". We then derive the 
marginal posterior distribution /(Yg|X(^),R) for the time dependent model 
obtaining 



/(Yg|X(^), 



-^2(_R') ' -^3(ii")' '^9) 



-In 
'9 > 
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= (27r)-("~''s)/2a~"/^C~''s~(''2s+A.-39)/2^-(fc29+fc3s)/2|^^|-l/2|(^^|-l^ 

x|i?g|-V2exp $fc^(0;-£;-iF„c7,ii;- 
with 

% = YjYg - YjgX2(i?/)^^^X^(^,)Y2g - YjgX3(^„)Cg"^Xj(^„)Y3g 

^5 = XX^ - X^(^)X2(/j/)A^ ^X^(^,)X2(R) - X3(^)X3(^//)C^ ^X3(^„-)X3(^), 
^9 = (^^(R')-^2(i^') + (cC)~^42g), 
^9 = (^3(R")-^3(i?") + (cC)~^439) 

and Yj = (Y ^, Y^, Y|^); k2g and k^g are the number of selected /3^^ and 
/5gm- We can now write the first step of the MCMC as follows: 

1'. We first select which of the three matrices to update. We choose R with 
probability A and R' (or R") with probability (1 — A)/2. We then use 
the same add/delete or swap scheme described above and we accept the 
proposed R°°" (or R'^i^;™ or R""*^^). For R the acceptance probability is 



mm 



f/'VlY Y* Y* "nncw TD/old ■r>//ol<iN_/'-r>now |_\ 

/I * l^(-R"<'");^2(i?'°ld)'^3(_R"old)>-'^ i-'^ i-'^ )'^\^ \') 



and similarly if R' or R" is selected. Note that to perform this step we 
need to use only the prior distribution of the selected matrix. 

This algorithm can be run either without any constraint on the moves 
relative to R, R' and R" or with the constraint that the elements of R' 
(or R") can be selected only when the corresponding element of R is al- 
ready selected and that the elements of R can be eliminated only when the 
corresponding element of R' and R" are not selected. For our application 
we adopted the constraint strategy. To implement this, we do not need to 
add the ratio of the proposal distributions into (7), since we use symmetric 
moves. This choice, jointly with some empirical results (not reported here), 
led us to not use association scores into the prior distribution of R' and R", 
because the selecting constraints imply that the prior probability of selecting 
the generic element (or r^'^) already depends on the association scores 
information through the prior probability on the corresponding element rg^. 
This also implies a faster computational procedure in comparison with the 
option of including the external information into the prior of R' and R". 
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5. Neural tube defects application. We now apply our model to analyze 
the data described in Section 2, combining miRNA and mRNA expression 
levels with sequence information. Our model allows us to identify significant 
miRNAs for each target, possibly along the time. 

5.1. Parameter settings. We first need to set the values of the hyper- 
parameters of the model. The parameter c of the prior distribution of the 
regression coefficients /3gm can be interpreted as a correction factor. Since 
truncating at zero a zero mean normal distribution with variance cr^ results 
in a half normal distribution with variance 0.7a^, we decided to set c = 0.7. 
Also, we specified a vague prior on ag by setting 5 = 3, the minimum value 
such that the expectation of exists, and chose c = 0.2, setting the expected 
value of the variance parameter ag comparable in size to a small percentage 
of the expected error variances of the standardized Y given X. 

In our variable selection framework, the parameter r] of the Bernoulli 
distribution (5) reflects the prior belief about the percentage of significant 
coefficients in the model. In this application, having 23 regressors for each 
of the 1297 equations, we set rj = —3 to obtain a prior expected number 
of regressors approximately equal to 1. This setting also corresponds to a 
5% prior probability of selection. In Section 5.2 we show that, even though 
T] affects the number of selected coefficients, no sensitivity to the posterior 
selection of the most influential arcs is observed. For the more computation- 
ally expensive time dependent model, we set 77 = —3 and r]h = 0.05, to avoid 
memory problems. We also set the hyperparameters Ur = 1.5 and br = 0.2 
to obtain a Gamma distribution that gives high probability to a broad set 
of values of ri , . . . , T5 , taking into account the scale of values that come 
from PicTar and the other algorithms. However, the posterior distributions 
we obtained, in all the different chains we ran, showed that the parameter 
setting of the Gamma distribution is not strongly informative. When run- 
ning MCMCs we have set the variance of the truncated normal proposal 
distribution of tj equal to 0.01 to obtain an acceptance rate close to 25%. 

We ran two different chains for each of the four possible combinations, 
the time invariant model for the control and the hyperthermia group and 
the time dependent model for the control and the hyperthermia group. We 
used either adding/deleting or swapping moves with equal probability at 
each step of the chain; we assigned a probability of A = 0.5 to the move that 
updates R and then probability 0.25 to each of the moves that update R' 
and R". In all cases, after the initial burn-in, both chains mostly explored 
the same region of the parameter space corresponding to configuration of 
R with high posterior probability. In general, we found good agreement 
between the two chains, which were run from different starting points. To 
be more precise, correlations between the posterior probabilities of the two 
chains ranged from 0.89 to 0.91. 
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Fig. 3. Trace plot for number of selected arrows and for the log-posterior probability for 
the time invariant model. 



Figure 3 gives the summary trace plots for the number of selected coeffi- 
cients f3gra and corresponding log-posterior probabilities for the time invari- 
ant model on the hyperthermia group. In this case the chain was run for 
three million iterations, from a starting randomly chosen set of 1000 arrows, 
and mostly visited models with 1000-1200 edges, that is, on average roughly 
1 edge per gene, a number not too far from the prior specification. 



5.2. Results. The huge number of potential coefficients in the model im- 
plies that the weight of a single coefficient toward the posterior probability of 
the entire model can be potentially very small. Also, due to sparsity, there 
may be many models with almost the same (small) posterior probability. 
Because of this, it is good practice to perform posterior inference based on 
the marginal posterior probability of the single coefficients, rather than on 
their joint distribution. These posterior probabilities of the presence of sin- 
gle interactions, that is, P{ Tgjyi — 1|Y,X), can be estimated directly from 
the MCMC samples by taking the proportion of MCMC iterations for which 

^gm — 1' 

The small sample size of our experimental groups does not allow us to 
create a validation set and, therefore, all the samples are used to fit the 
model. Selected models are then evaluated based on the B? statistic, calcu- 
lated using the posterior mean of regression coefficients. As expected, when 
more covariates are included into the model, based on their posterior proba- 
bilities, the statistic E? increases. We observed that coefficients with highest 
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posterior probability explain most of the variability, while the increment in 
B?' becomes marginal when adding coefficients with relatively low posterior 
probability. We take this as an indication of the fact that the ordering cre- 
ated by the posterior probabilities correctly maps the significant variables. 
For the time invariant model a threshold of 0.15, corresponding to 1224 
included edges, gave an of 0.36, for the control group, and of 0.44 for 
the hyperthermia group, with 1473 included edges. Identical behavior was 
observed for the additional coefficients of the time dependent model, that 
is, when the number of included /3''s and /3"'s increases, then the quality 
of the fitting improves; with a threshold of 0.15 for /3's and a threshold of 
0.1 for /3"s and /3"'s, we obtain a B? = 0.45 for the control group, including 
1826 /3's, 366 /3"s and 222 /3'"s, and a i?^ = 0.47 for the hyperthermia group, 
including 2173 /3's, 439 /3"s and 296 /3"'s. 

In an effort to assess whether our model correctly selects miRNAs that 
under-regulate target genes, we also calculated the ordinary least square esti- 
mates of the regression coefficients and checked how many of them were neg- 
ative; see the Appendix for the calculation of the OLS estimates. Notice that 
this approach does not impose the negative constraint on /3's. By including 
all coefficients with posterior probability greater than 0.2 (0.15), we found 
that, for the control and hyperthermia group, respectively, 96.4(93.5)% and 
95.0(90.4)% of the estimated coefficients were correctly negative. 

Important pairs of target genes and miRNAs can be selected as those 
corresponding to arrows with highest posterior probabilities. For example, by 
exploring the regulatory network as a function of this posterior probability 
of the arrows, we found that, for the time invariant model on the control 
group, a posterior probability cutoff of 0.8 selected 88 arrows between 88 
target genes and 7 miRNAs. These correspond to an expected rate of false 
detection (Bayesian FDR) of 7.5%, that we calculated, following Newton et 
al. (2004), as 

FDR = C{k)/\J^\, 

where C(k) = Y.g,m'^9mI[^g^<K] and il^gm = 1 - P{rgm = 1|Y,X), with \ J^\ 
the size of the list (| J^l = Ylg m h'^gm<K])- set k = 1 — A; with k the chosen 
threshold (i.e., 0.8). For the same cutoff the time-dependent analysis on the 
control group showed 11 miRNAs with at least one target gene for a total 
of 107 gene targets. There were 7 miRNA and 75 gene targets in common 
between the time-independent and time-dependent analyses of the control 
data. For the hyperthermia-treated group, the time-invariant model with a 
0.8 cutoff led to 93 selected arrows, between 91 target genes and 11 miRNAs, 
corresponding to a Bayesian FDR of 9.0%. The time-dependent analysis 
showed 12 miRNAs with at least one target gene for a total of 120 gene 
targets. There were 10 miRNA and 77 gene targets in common between the 
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Fig. 4. Selected network for the hyperthermia group using a threshold of 0.8 on the 
posterior probability. 

time-independent and time-dependent analysis of the hyperthermia-treated 
data. 

Figure 4, produced using GraphExplore of Wang, Dobra and West (2004), 
displays the selected network for the hyperthermia group and a threshold of 
0.8 on the posterior probability under the time invariant model. A close look 
at the pairs of target genes and miRNAs with high posterior probabilities 
reveals that some of the regulatory relationships seem plausible and warrant 
future investigation. For example, links between miR-367 and target Egr2 
and Mobl, selected with posterior probability of 0.97 and 1, are particularly 
interesting, as described below. 

Increasing the cutoff value on the posterior probabilities clearly reduces 
the number of selected arrows and results in lower Bayesian FDRs. For 
example, a cutoff value of 0.9 identified 60 arrows between 60 target genes 
and 6 miRNAs, corresponding to a Bayesian FDR of 4.1%, for the control 
group, and 50 selected arrows, between 50 target genes and 9 miRNAs, 
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corresponding to a Bayesian FDR of 3.9%, for the hyperthermia-treated 
group. 

Overall, there were 252 unique miRNA-gene target associations identi- 
fied with a posterior probability of at least 0.8, including 15 miRNAs and 
221 gene targets. Of the 252 miRNA-gene target associations identified, 35 
were predicted by miRanda only, 26 by PicTar only, zero by total Target 
Scan only, 14 by aggregate Target Scan only, 8 by PITA only, and 45 by at 
least one of the five algorithms considered. 108 of the gene targets identified 
were associated with miR-367, a pluripotency-specific marker in human and 
mouse ES cells [Li et al. (2009)], while 27 of the gene targets were associ- 
ated with miR-423, which has previously been shown to be expressed in the 
adult and/or developing brain [Zhang and Pan (2009)]. Expression of both 
miR-367 and 423 decreased over time in control and hyperthermia-treated 
embryos, which is consistent for a marker of pluripotency in a differentiat- 
ing embryo. While decreasing, the expression levels of these miRNAs were 
higher after hyperthermia exposure when compared to controls, which may 
indicate a delay in the differentiation program. In addition, 62 of the gene 
targets were associated with miR-299-5p, which has been shown to regulate 
de novo expression of osteopontin, a protein that plays a role in enhancing 
proliferation and tumorigenicity [Shevde et al. (2010)]. 

Among the target genes identified, 12 genes associated with brain devel- 
opment or expressed in brain/whole embryo (including Egr2, Hnflb, and 
Mobl) were associated with miR-367 with a posterior probability in 0.82- 
1.0 in the time-dependent analysis of hyperthermia-treatment data at the 
5-hour time point. 11 of these gene target associations also had a posterior 
probability in 0.68-1.0 with the time-independent analysis of hyperthermia 
treatment data. At the 5-hour time point after hyperthermia treatment, 
miR-367 expression increased 1.7- fold, while expression of these associated 
gene targets decreased 1.1-2 fold when compared to control-treatment. This 
pattern of expression might be indicative of down-regulation of these gene 
targets by increased expression of miR-367 in response to hyperthermia 
treatment. Such gene-miRNA associations, identified by our methods as pos- 
sibly related to brain and embryo development, are interesting to pursue in 
follow-up NTD studies. 

It is also interesting to look at the inference on the regression coefficients. 
Figure 5 shows the estimates of the significant /3gm's for the time invari- 
ant model under hyperthermia condition. Each bar in the plot represents 
the 1297 regression coefficients for one of the 23 miRNAs. Nonzero values 
correspond to the posterior mean estimates of the best Pgm^^ with poste- 
rior inclusion probability above 0.15 (all other /3's are estimated by zero). 
Notice, for example, how miRNAs miR-423, corresponding to the 20th bar, 
and miR-367, corresponding to the 18th bar, play an important role in the 
down-regulatory mechanism. 



20 



F. C. STINGO ET AL. 



I — I — I — I — I — I I — I — I — I — I — I I — I — I — I — I — I I — I — I — I — I — I I — I — I — I — I — I I — I — I — I — I — I 

-10 -6 -4 -2 0-10 -6 -4 -2 0-10 -6 -4 -2 0-10 -6 -4 -2 0-10 -6 -4 -2 0-10 -6 -4 -2 



I 1 1 1 1 1 I 1 1 1 1 1 I 1 1 1 1 1 I 1 1 1 1 1 I 1 1 1 1 1 I 1 1 1 1 1 

-10 -6 -4 -2 0-10 -6 -4 -2 0-10 -6 -4 -2 0-10 -6 -4 -2 0-10 -6 -4 -2 0-10 -6 -4 -2 



I 1 1 1 1 1 I 1 1 1 1 1 I 1 1 1 1 1 I 1 1 1 1 1 I 1 1 1 1 1 I 1 1 1 1 1 

-10 -6 -4 -2 0-10 -6 -4 -2 0-10 -6 -4 -2 0-10 -6 -4 -2 0-10 -6 -4 -2 0-10 -6 -4 -2 



I 1 1 1 1 1 I 1 1 1 1 1 I 1 1 1 1 1 I 1 1 1 1 1 I 1 1 1 1 1 

-10 -6 -4 -2 0-10 -6 -4 -2 0-10 -6 -4 -2 0-10 -6 -4 -2 0-10 -6 -4 -2 



Fig. 5. Estimation of the significant figm 's for the time invariant model under hyper- 
thermia condition. The y-axis indicates the 1297 targets, listed in the same order in all 23 
plots. 
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Fig. 6. Density Kernel estimate using the time independent model for the control group. 

Let us now comment on the inference on ri , . . . , rs . These parameters 
measure the influence of the prior information on the posterior inference. 
In general, we noticed that posterior inference on these parameters showed 
some sensitivity to the value assigned to r/. When selecting edges the hyper- 
parameter r] represents the weight assigned to the data and, consequently, 
Ti , . . . , T5 play the role of the weight of the prior sequence information de- 
rived from the five used algorithms. The bigger the value of ry, the more 
the posterior distribution of Tj will be concentrated around small values. 
Besides this general rule, inference on the Tj's generally depends on the con- 
cordance between data and prior information, the number of observations 
and the number of parameters in the model. As an example, the behavior 
of the posterior distribution of ri (the parameter associated to PicTar), for 
different values of r/, is summarized in Figure 6. The scale of the estimates 
compensates the very large values we observe for some of the PicTar scores. 
We can clearly see how the posterior distribution concentrates on bigger 
values when r/ decreases. To evaluate the influence of the different scores, we 
calculated how the prior probabilities (5) increase, on average, for a set of 
pairs target-miRNA with high scores. With r/ = —3.5 the prior probability 
of fgm — 1 increase by 35.9%, while when setting r] — —2.5 this increment is 
equal only to 8.7%. 

Our results suggested that information extracted from PicTar is the most 
influential on the posterior inference. MiRanda and Target Scan aggregate 
also contribute somehow to the selection process, while Target Scan total 
and PITA do not affect the posterior inference. Figure 7 shows the posterior 
inference for the three most influential algorithms when = — 3. The other 
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Fig. 7. Density Kernel estimate using the time independent model for the control group 
(left panel) and the time dependent model for the hyperthermia group (right panel). 



two algorithms resulted in posterior densities that were very concentrated 
around zero (plot not shown). Notice that, while the importance order does 
not change, the 3 algorithms are generally more influential in the case of 
the time dependent model for the hyperthermia group (right panel). Also, 
Target Scan aggregate seems to have an increased effect for this case, com- 
pared to the model with time invariant coefficients on the control group (left 
panel). In general, we found that with rj = —3 the posterior distributions of 
Ti,. . . ,T5, for the control group, are concentrated around values that imply 
a 16.4% increase on the prior probability of = for high scores. For the 
hyperthermia group the corresponding percentage, as consequence of the in- 
creased influence of PicTar and Target Scan aggregate, increases to 37.8%. 
When using the time dependent model the prior probability of rgm = 1 in- 
creases by 32.6% for the control group and, due mostly to the increased 
effect of PicTar and Target Scan aggregate, by 118.2% for the hyperthermia 
group. 

Because different sequence/structure based methods do not result in ex- 
actly the same set of miRNA target predictions, we decided to perform a 
systematic analysis to understand how different prior target predictions can 
influence the flnal inference. Accordingly, we selected the control group and 
the model with time invariant coefficients and repeated our analysis by in- 
tegrating in the prior formulation PicTar, miRanda, Target Scan aggregate. 
Target Scan total and PITA scores, one set at the time. We then compared 
the selected arrows obtained integrating single sets to those selected by in- 
tegrating all five available sets of scores. Using r] = —3, we found that the 
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network selection is only partially affected by the different data integrations. 
In particular, by considering the first 200 arrows, selected according to the 
posterior probability, we found that the set of selected ones using PicTar 
overlaps at the 73.5% with the set selected using all 5 databases; this per- 
centage is equal to 66.0% for miRanda, to 75.0% for Target Scan aggregate, 
to 70.0% for Target Scan total and 72.5% for PITA. If we instead consider 
the first 1000 arrows, then these percentages get lower (66.4% for PicTar, 
60.0% for miRanda, 68.8% for Target Scan aggregate, 65.8% for Target Scan 
total and 68.2% for PITA). These results indicate that although the selec- 
tion is mostly data driven, especially when restricted to the most important 
arrows, it is also partially affected by the integrated sets. 

6. Conclusions. We have proposed a Bayesian graphical modeling ap- 
proach that infers the miRNA regulatory network by integrating expression 
levels of miRNAs with their potential mRNA targets and, via the prior 
probability model, with their sequence/structure information. Our model 
is able to incorporate multiple data sources directly into the prior distribu- 
tion, avoiding arbitrary prior data synthesis. We have used stochastic search 
variable selection methods to infer the miRNA regulatory network. We have 
considered experimental data from a study on a very well-known develop- 
mental toxicant causing neural tube defects, hyperthermia. The analysis has 
involved 23 mouse miRNAs and a total of 1297 potential targets. Our goal 
was to identify a small set of potential targets with high confidence. Some of 
the pairs of target gene and miRNA selected by our model seem promising 
candidates for future investigation. In addition, the time-dependent model 
has achieved significant improvement in the percentage of explained vari- 
ance, only slightly increasing the size of the selected model. Our proposed 
modeling strategy is general and can easily be applied to other types of 
network inference by integrating multiple data sources. 

An interesting feature of our inference is that there is only a partial over- 
lap between the connections selected by our model and those indicated by 
the sequence/structure algorithms. This phenomenon has been observed by 
other authors in models for data integration. Wei and Li (2008), for example, 
attribute this to the fact that our knowledge of biological processes is not 
complete and can potentially include errors and therefore induce misspeci- 
fied edges on the networks. They also suggest to first check the consistency of 
the prior information with the available data. In our case, if the correlation 
between a miRNA and a target gene is very small, we may want to remove 
the edge from the network. On the other hand, given the limited number 
of observations typical of experimental studies in genomics, it would seem 
important to retain as much, possibly accurate, prior information as possi- 
ble. This important aspect of models for data integration certainly deserves 
future investigation. 
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Extensions and generalizations of our model are possible. One future av- 
enue we intent to pursue is trying to relax the assumption on the conditional 
independence of the targets given the miRNAs. This assumption is neces- 
sary in order to integrate out the covariance matrix, as in Brown, Vannucci 
and Fearn (1998), and still allow the selection of individual relations be- 
tween a gene and a miRNA. Looking at this as a computational issue, it 
may be possible to still sample the values of this huge covariance matrix in 
the MCMC, perhaps by reducing the number of nonzero elements via the 
prior information on the gene network. 

APPENDIX A: POSTERIOR INFERENCE ON REGRESSION 

COEFFICIENTS 

If inference on regression coefficients is desirable, these can estimate either 
via the posterior distributions or the least squares estimates. For model (1) 
we have the following posterior distribution: 

(10) 7T{/3g\Y,X^j,:,,uj^) ^ HN-^{Ugaj,agUg), 

where HN^ indicates a kg-vahate half-normal distribution that gives posi- 
tive probability only to vectors formed by elements bigger than zero. 

For the more general time-dependent model we have the following poste- 
rior distributions: 

(7ril3g\Y,X^ji),u;^)^HN+{E-^Fg,agE-^), 
^ ^ \7ri/3'^\Y,X^j,:^,u;^)r^N{J-^Hg,agJ-'), 

with 

= ^Ig^SiR") 

+ (yJX(^) - YjgX2(R/)£'~^X^(.^,-)X2(/j) +o-y^c"Hfcg)Lg^X^(^)X3(fl//), 
^g = ^2iR')^2{R') + {(cy^hig, 

^g = ~ ^2iR)^2{R')Dg ^X^(^,-)X2(/j). 

The posterior distribution of /?' has the same form as the posterior distribu- 
tion of /?". Using the least squares approach, instead, we obtain the following 
equations for f3, f3' and f3": 

PgLS = 0^fR)^{R)) ^^'(R)0^g - ^2(R')l^g + ^3(iJ")^9)' 
l^'gLS = 0^2{R')^2{R')) ^2(R')0^2g " ^2(R)l^g)i 
l^"gLS = i^l[R»y^3(R")) ^l[R»)0^3g " ^3{R)l^g)^ 
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and then 

+ ^3(ii")(-^3(i?")-^3(if")) ^^3(R")-^35)], 



with 

= 4g - (X^)X(R)) ^X^)(X2(^,-)(X2(^,-)X2(H/)) ^X^(^,-)X2(ij) 

+ ^3{R")0^3{R")^3{R")) ^^3{R")^3{R))- 
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